[libclc] Implement __clc_rsqrt with __ocml_rsqrt_* functions #152436

wenju-he · 2025-08-07T05:03:26Z

Motivation is to upstream use of _ocml_rsqrt in
https://github.com/intel/llvm/blob/sycl/libclc/libspirv/lib/amdgcn-amdhsa/math/rsqrt.cl

llvm-diff shows vectorized calls of llvm.sqrt.v2f32 and fdiv are scalarized: Old:
> %2 = tail call contract <2 x float> @llvm.sqrt.v2f32(<2 x float> %0), !fpmath !5
> %3 = fdiv contract <2 x float> splat (float 1.000000e+00), %2, !fpmath !4
!4 = !{float 2.500000e+00}
!5 = !{float 3.000000e+00}
New:
< %2 = extractelement <2 x float> %0, i64 0
< %3 = tail call float @__ocml_rsqrt_f32(float noundef %2)
< %4 = insertelement <2 x float> poison, float %3, i64 0
< %5 = extractelement <2 x float> %0, i64 1
< %6 = tail call float @__ocml_rsqrt_f32(float noundef %5)
< %7 = insertelement <2 x float> %4, float %6, i64 1

Motivation is to upstream use of __ocml_rsqrt_ in https://github.com/intel/llvm/blob/sycl/libclc/libspirv/lib/amdgcn-amdhsa/math/rsqrt.cl llvm-diff shows vectorized calls of llvm.sqrt.v2f32 and fdiv are scalarized: Old: > %2 = tail call contract <2 x float> @llvm.sqrt.v2f32(<2 x float> %0), !fpmath !5 > %3 = fdiv contract <2 x float> splat (float 1.000000e+00), %2, !fpmath !4 !4 = !{float 2.500000e+00} !5 = !{float 3.000000e+00} New: < %2 = extractelement <2 x float> %0, i64 0 < %3 = tail call float @__ocml_rsqrt_f32(float noundef %2) < %4 = insertelement <2 x float> poison, float %3, i64 0 < %5 = extractelement <2 x float> %0, i64 1 < %6 = tail call float @__ocml_rsqrt_f32(float noundef %5) < %7 = insertelement <2 x float> %4, float %6, i64 1

wenju-he · 2025-08-07T05:04:52Z

@arsenm I'm not sure if the change is an improvement, please review.

github-actions · 2025-08-07T05:05:55Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff HEAD~1 HEAD --extensions cl -- libclc/clc/lib/amdgcn/math/clc_rsqrt.cl

View the diff from clang-format here.

diff --git a/libclc/clc/lib/amdgcn/math/clc_rsqrt.cl b/libclc/clc/lib/amdgcn/math/clc_rsqrt.cl
index 4a9ae94b7..69164fff7 100644
--- a/libclc/clc/lib/amdgcn/math/clc_rsqrt.cl
+++ b/libclc/clc/lib/amdgcn/math/clc_rsqrt.cl
@@ -10,13 +10,17 @@
 
 float __ocml_rsqrt_f32(float);
 
-_CLC_OVERLOAD _CLC_DEF float __clc_rsqrt(float x) { return __ocml_rsqrt_f32(x); }
+_CLC_OVERLOAD _CLC_DEF float __clc_rsqrt(float x) {
+  return __ocml_rsqrt_f32(x);
+}
 
 #ifdef cl_khr_fp64
 #pragma OPENCL EXTENSION cl_khr_fp64 : enable
 double __ocml_rsqrt_f64(double);
 
-_CLC_OVERLOAD _CLC_DEF double __clc_rsqrt(double x) { return __ocml_rsqrt_f64(x); }
+_CLC_OVERLOAD _CLC_DEF double __clc_rsqrt(double x) {
+  return __ocml_rsqrt_f64(x);
+}
 
 #endif
 
@@ -24,9 +28,7 @@ _CLC_OVERLOAD _CLC_DEF double __clc_rsqrt(double x) { return __ocml_rsqrt_f64(x)
 #pragma OPENCL EXTENSION cl_khr_fp16 : enable
 half __ocml_rsqrt_f16(half);
 
-_CLC_OVERLOAD _CLC_DEF half __clc_rsqrt(half x) {
-  return __ocml_rsqrt_f16(x);
-}
+_CLC_OVERLOAD _CLC_DEF half __clc_rsqrt(half x) { return __ocml_rsqrt_f16(x); }
 
 #endif

arsenm

ocml rsqrt should be deleted, this is implementable by

float rsqrt(float x) {
    #pragma clang fp contract(fast)
    return 1.0f / __builtin_elementwise_sqrt(x);
}

wenju-he · 2025-08-07T05:14:45Z

ocml rsqrt should be deleted, this is implementable by
float rsqrt(float x) {
    #pragma clang fp contract(fast)
    return 1.0f / __builtin_elementwise_sqrt(x);
}

thanks @arsenm. So current generic implementation is good:

llvm-project/libclc/clc/lib/generic/math/clc_rsqrt.inc

Lines 10 to 13 in eb0ddba

    
           __clc_rsqrt(__CLC_GENTYPE val) { 
        
           #pragma clang fp contract(fast) 
        
             return __CLC_FP_LIT(1.0) / __builtin_elementwise_sqrt(val); 
        
           }

Close this PR.

…#19722) Before this PR, libspirv-nvptx64--nvidiacl uses __nv_rsqrtf/isinf/copysign in these two functions, and libspirv-amdgcn--amdhsa uses __ocml_rsqrt/copysign. We can upstream use of target built-in to CLC library: Use of __nv_rsqrtf/isinf is to be upstreamed by llvm/llvm-project#150174. Then the changes to libspirv-nvptx64--nvidiacl should be minimized. Generic implementation of __clc_rsqrt is better than __ocml_rsqrt according to llvm/llvm-project#152436. Use of these target built-in scalarize llvm vector intrinsic. So it is likely better to use generic implementation. In addition, use of generic implementation aligns with OpenCL library. Delete libspirv/generic/math/floatn.inc. Delete unused clc/relational/floatn.inc and libspirv/generic/math/minmag.inc.

wenju-he requested a review from frasercrmck August 7, 2025 05:03

llvmbot added the libclc libclc OpenCL library label Aug 7, 2025

wenju-he requested a review from arsenm August 7, 2025 05:04

arsenm reviewed Aug 7, 2025

View reviewed changes

wenju-he closed this Aug 7, 2025

wenju-he deleted the libclc-__clc_rsqrt-amdgcn branch August 7, 2025 05:14

wenju-he mentioned this pull request Aug 7, 2025

[libspirv] reuse clc function in __spirv_ocl_normalize/fast_normalize intel/llvm#19722

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[libclc] Implement __clc_rsqrt with __ocml_rsqrt_* functions #152436

[libclc] Implement __clc_rsqrt with __ocml_rsqrt_* functions #152436

Uh oh!

wenju-he commented Aug 7, 2025

Uh oh!

wenju-he commented Aug 7, 2025

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

arsenm left a comment

Uh oh!

wenju-he commented Aug 7, 2025

Uh oh!

Uh oh!

[libclc] Implement __clc_rsqrt with __ocml_rsqrt_* functions #152436

[libclc] Implement __clc_rsqrt with __ocml_rsqrt_* functions #152436

Uh oh!

Conversation

wenju-he commented Aug 7, 2025

Uh oh!

wenju-he commented Aug 7, 2025

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

wenju-he commented Aug 7, 2025

Uh oh!

Uh oh!